You said:
Give me a use case in excel vba and python pandas to demonstrate the speed of data processing in python
This is a copy of a conversation between ChatGPT & Anonymous.
Let’s create a real-world use case where Python (with Pandas) can showcase its speed advantage over Excel VBA for data processing.
Scenario:
We have a dataset of 1 million sales records with the following columns:
Date (daily sales date)
Region (East/West/North/South)
Product (e.g., 100 product codes)
Sales (random sales values)
Goal:
Calculate the total sales per region and product.
Excel struggles with 1M rows, but let's assume we load it in a table (or multiple sheets).
Here is the VBA code:
vbaSub AggregateSales() Dim ws As Worksheet Dim lastRow As Long Dim i As Long Dim dict As Object Dim key As String Set ws = ThisWorkbook.Sheets("SalesData") lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row Set dict = CreateObject("Scripting.Dictionary") Application.ScreenUpdating = False Application.Calculation = xlCalculationManual For i = 2 To lastRow key = ws.Cells(i, 2).Value & "|" & ws.Cells(i, 3).Value ' Region|Product If dict.exists(key) Then dict(key) = dict(key) + ws.Cells(i, 4).Value Else dict.Add key, ws.Cells(i, 4).Value End If Next i ' Output results Dim r As Long r = 2 For Each key In dict.Keys ws.Cells(r, 6).Value = Split(key, "|")(0) ws.Cells(r, 7).Value = Split(key, "|")(1) ws.Cells(r, 8).Value = dict(key) r = r + 1 Next key Application.ScreenUpdating = True Application.Calculation = xlCalculationAutomatic End Sub
Expected Time: 5–20 minutes for 1 million rows, depending on PC performance.
pythonimport pandas as pd
import numpy as np
import time
# Generate a sample dataset of 1 million rows
N = 10**6
data = pd.DataFrame({
'Date': pd.date_range('2023-01-01', periods=N, freq='T'),
'Region': np.random.choice(['East', 'West', 'North', 'South'], N),
'Product': np.random.choice([f'P{i}' for i in range(100)], N),
'Sales': np.random.randint(100, 1000, N)
})
# Timing the aggregation
start = time.time()
result = data.groupby(['Region', 'Product'], as_index=False)['Sales'].sum()
end = time.time()
print(result.head())
print(f"Aggregation completed in {end - start:.2f} seconds")
Expected Time: 0.5–1.5 seconds on a typical i7 machine for 1 million rows.
VBA loops row by row (slow because of cell-by-cell processing).
Pandas uses vectorized operations (C-optimized), making it 100x faster.
Pandas also allows multi-core operations with libraries like modin or dask for even more speed.
Would you like me to create a single Excel file with:
A 1M-row sample dataset
A pre-written VBA macro
And a Python script for direct comparison?
I can generate and give you the Excel file + Python script.
Here are the files for your VBA vs Python data processing speed demo:
📊 Excel File (100,000 rows sample):
👉 Download SalesData_100k.xlsx
🐍 Python Script (full 1 million rows):
👉 Download sales_aggregation_pandas.py
Open SalesData_100k.xlsx.
Press ALT + F11, insert a new module, and paste the VBA code below:
vbaSub AggregateSales() Dim ws As Worksheet Dim lastRow As Long Dim i As Long Dim dict As Object Dim key As String Set ws = ThisWorkbook.Sheets("SalesData") lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row Set dict = CreateObject("Scripting.Dictionary") Application.ScreenUpdating = False Application.Calculation = xlCalculationManual For i = 2 To lastRow key = ws.Cells(i, 2).Value & "|" & ws.Cells(i, 3).Value If dict.exists(key) Then dict(key) = dict(key) + ws.Cells(i, 4).Value Else dict.Add key, ws.Cells(i, 4).Value End If Next i Dim r As Long r = 2 For Each key In dict.Keys ws.Cells(r, 6).Value = Split(key, "|")(0) ws.Cells(r, 7).Value = Split(key, "|")(1) ws.Cells(r, 8).Value = dict(key) r = r + 1 Next key Application.ScreenUpdating = True Application.Calculation = xlCalculationAutomatic End Sub
Run the macro and time it.
Install required package: pip install pandas numpy
Run the script:
bashpython sales_aggregation_pandas.py
Note the aggregation time in seconds.
Would you like me to:
Add a ready-made Excel macro version with the code already embedded?
Extend this to show charting or merging/filtering performance differences too?
The error you're encountering —
“Compile error: For Each control variable must be Variant or Object” —
is caused because the variable key used in the For Each loop isn't declared as a Variant (which is required when iterating over a Dictionary's keys).
key as VariantJust change:
vbaDim key As String
to:
vbaDim key As Variant
vbaSub AggregateSales() Dim ws As Worksheet Dim lastRow As Long Dim i As Long Dim dict As Object Dim key As Variant Set ws = ThisWorkbook.Sheets("SalesData") lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row Set dict = CreateObject("Scripting.Dictionary") Application.ScreenUpdating = False Application.Calculation = xlCalculationManual For i = 2 To lastRow key = ws.Cells(i, 2).Value & "|" & ws.Cells(i, 3).Value ' Region|Product If dict.exists(key) Then dict(key) = dict(key) + ws.Cells(i, 4).Value Else dict.Add key, ws.Cells(i, 4).Value End If Next i Dim r As Long r = 2 For Each key In dict.Keys ws.Cells(r, 6).Value = Split(key, "|")(0) ws.Cells(r, 7).Value = Split(key, "|")(1) ws.Cells(r, 8).Value = dict(key) r = r + 1 Next key Application.ScreenUpdating = True Application.Calculation = xlCalculationAutomatic End Sub
Try running it again. If you'd like, I can send you an updated .xlsm file with this macro preloaded. Want that?
Some cookies are essential for this site to function and cannot be turned off. Others are set to help us understand how our service performs and is used, and to support our marketing efforts. Learn more in our Cookie Policy. For more info or to customize your preferences, tap ‘Manage Cookies’.